NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Toward an Education Hub Linking Research Data and Compute to Learning Workflows in the National Data Platform

https://doi.org/10.1145/3708035.3736089

Floca, Melissa; O'Laughlin, Kate; Ramonetti_Vega, Pedro; Gupta, Amarnath; Altintas, Ilkay; Parashar, Manish (July 2025, ACM)

Free, publicly-accessible full text available July 18, 2026
CrossPrefetch: Accelerating I/O Prefetching for Modern Storage

https://doi.org/10.1145/3617232.3624872

Garg, Shaleen; Zhang, Jian; Pitchumani, Rekha; Parashar, Manish; Xie, Bing; Kannan, Sudarsun (April 2024, ACM)

We introduce CrossPrefetch, a novel cross-layered I/O prefetching mechanism that operates across the OS and a user-level runtime to achieve optimal performance. Existing OS prefetching mechanisms suffer from rigid interfaces that do not provide information to applications on the prefetch effectiveness, suffer from high concurrency bottlenecks, and are inefficient in utilizing available system memory. CrossPrefetch addresses these limitations by dividing responsibilities between the OS and runtime, minimizing overhead, and achieving low cache misses, lock contentions, and higher I/O performance. CrossPrefetch tackles the limitations of rigid OS prefetching interfaces by maintaining and exporting cache state and prefetch effectiveness to user-level runtimes. It also addresses scalability and concurrency bottlenecks by distinguishing between regular I/O and prefetch operations paths and introduces fine-grained prefetch indexing for shared files. Finally, CrossPrefetch designs low-interference access pattern prediction combined with support for adaptive and aggressive techniques to exploit memory capacity and storage bandwidth. Our evaluation of CrossPrefetch, encompassing microbenchmarks, macrobenchmarks, and real-world workloads, illustrates performance gains of up to 1.22x-3.7x in I/O throughput.We also evaluate CrossPrefetch across different file systems and local and remote storage configurations.
more » « less
Full Text Available
Toward Democratizing Access to Science Data: Introducing the National Data Platform

https://doi.org/10.1109/e-Science58273.2023.10254930

Parashar, Manish; Altintas, Ilkay (October 2023, IEEE)

Full Text Available
Accelerating Data-Intensive Seismic Research Through Parallel Workflow Optimization and Federated Cyberinfrastructure

https://doi.org/10.1145/3624062.3624276

Adair, Marcus; Rodero, Ivan; Parashar, Manish; Melgar, Diego (November 2023, SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis)

Earthquake early warning systems use synthetic data from simulation frameworks like MudPy to train models for predicting the magnitudes of large earthquakes. MudPy, although powerful, has limitations: a lengthy simulation time to generate the required data, lack of user-friendliness, and no platform for discovering and sharing its data. We introduce FakeQuakes DAGMan Workflow (FDW), which utilizes Open Science Grid (OSG) for parallel computations to accelerate and streamline MudPy simulations. FDW significantly reduces runtime and increases throughput compared to a single-machine setup. Using FDW, we also explore partitioned parallel HTCondor DAGMan workflows to enhance OSG efficiency. Additionally, we investigate leveraging cyberinfrastructure, such as Virtual Data Collaboratory (VDC), for enhancing MudPy and OSG. Specifically, we simulate using Cloud bursting policies to enforce FDW job-offloading to VDC during OSG peak demand, addressing shared resource issues and user goals; we also discuss VDC’s value in facilitating a platform for broad access to MudPy products.
more » « less
Full Text Available
Toward Democratizing Access to Facilities Data: A Framework for Intelligent Data Discovery and Delivery

https://doi.org/10.1109/MCSE.2022.3179408

Qin, Yubo; Rodero, Ivan; Parashar, Manish (May 2022, Computing in Science & Engineering)

Full Text Available
Facilitating Data Discovery for Large-scale Science Facilities using Knowledge Networks

https://doi.org/10.1109/IPDPS49936.2021.00073

Qin, Yubo; Rodero, Ivan; Parashar, Manish (May 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS))
null (Ed.)
Large-scale multiuser scientific facilities, such as geographically distributed observatories, remote instruments, and experimental platforms, represent some of the largest national investments and can enable dramatic advances across many areas of science. Recent examples of such advances include the detection of gravitational waves and the imaging of a black hole’s event horizon. However, as the number of such facilities and their users grow, along with the complexity, diversity, and volumes of their data products, finding and accessing relevant data is becoming increasingly challenging, limiting the potential impact of facilities. These challenges are further amplified as scientists and application workflows increasingly try to integrate facilities’ data from diverse domains. In this paper, we leverage concepts underlying recommender systems, which are extremely effective in e-commerce, to address these data-discovery and data-access challenges for large-scale distributed scientific facilities. We first analyze data from facilities and identify and model user-query patterns in terms of facility location and spatial localities, domain-specific data models, and user associations. We then use this analysis to generate a knowledge graph and develop the collaborative knowledge-aware graph attention network (CKAT) recommendation model, which leverages graph neural networks (GNNs) to explicitly encode the collaborative signals through propagation and combine them with knowledge associations. Moreover, we integrate a knowledge-aware neural attention mechanism to enable the CKAT to pay more attention to key information while reducing irrelevant noise, thereby increasing the accuracy of the recommendations. We apply the proposed model on two real-world facility datasets and empirically demonstrate that the CKAT can effectively facilitate data discovery, significantly outperforming several compelling state-of-the-art baseline models.
more » « less
Full Text Available
Leveraging user access patterns and advanced cyberinfrastructure to accelerate data delivery from shared-use scientific observatories

https://doi.org/10.1016/j.future.2021.03.004

Qin, Yubo; Rodero, Ivan; Simonet, Anthony; Meertens, Charles; Reiner, Daniel; Riley, James; Parashar, Manish (September 2021, Future Generation Computer Systems)
null (Ed.)
Full Text Available
Harnessing the Computing Continuum for Urgent Science

https://doi.org/10.1145/3439602.3439618

Balouek-Thomert, Daniel; Rodero, Ivan; Parashar, Manish (November 2020, ACM SIGMETRICS Performance Evaluation Review)

Full Text Available
The Need for Precise and Efficient Memory Capacity Budgeting

https://doi.org/10.1145/3422575.3422791

Garg, Shaleen; Kannan, Sudarsun; Parashar, Manish (September 2020, MEMSYS 2020: The International Symposium on Memory Systems)

Full Text Available
Scalable Crash Consistency for Staging-based In-situ Scientific Workflows

https://doi.org/10.1109/IPDPSW50202.2020.00068

Duan, Shaohua; Parashar, Manish (May 2020, 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW))
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records